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        End-to-end Performance Implications of Links with Errors

Status of this Memo

   This document specifies an Internet Best Current Practices for the
   Internet Community, and requests discussion and suggestions for
   improvements.  Distribution of this memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2001).  All Rights Reserved.

Abstract

   This document discusses the specific TCP mechanisms that are
   problematic in environments with high uncorrected error rates, and
   discusses what can be done to mitigate the problems without
   introducing intermediate devices into the connection.
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1.0 Introduction

   The rapidly-growing Internet is being accessed by an increasingly
   wide range of devices over an increasingly wide variety of links.  At
   least some of these links do not provide the degree of reliability
   that hosts expect, and this expansion into unreliable links causes
   some Internet protocols, especially TCP [RFC793], to perform poorly.

   Specifically, TCP congestion control [RFC2581], while appropriate for
   connections that lose traffic primarily because of congestion and
   buffer exhaustion, interacts badly with uncorrected errors when TCP
   connections traverse links with high uncorrected error rates.  The
   result is that sending TCPs may spend an excessive amount of time
   waiting for acknowledgement that do not arrive, and then, although
   these losses are not due to congestion-related buffer exhaustion, the
   sending TCP transmits at substantially reduced traffic levels as it
   probes the network to determine "safe" traffic levels.

   This document does not address issues with other transport protocols,
   for example, UDP.

   Congestion avoidance in the Internet is based on an assumption that
   most packet losses are due to congestion.  TCP's congestion avoidance
   strategy treats the absence of acknowledgement as a congestion
   signal.  This has worked well since it was introduced in 1988 [VJ-
   DCAC], because most links and subnets have relatively low error rates
   in normal operation, and congestion is the primary cause of loss in
   these environments.  However, links and subnets that do not enjoy low
   uncorrected error rates are becoming more prevalent in parts of the
   Internet.  In particular, these include terrestrial and satellite
   wireless links.  Users relying on traffic traversing these links may
   see poor performance because their TCP connections are spending
   excessive time in congestion avoidance and/or slow start procedures
   triggered by packet losses due to transmission errors.

   The recommendations in this document aim at improving utilization of
   available path capacity over such high error-rate links in ways that
   do not threaten the stability of the Internet.

   Applications use TCP in very different ways, and these have
   interactions with TCP's behavior [RFC2861].  Nevertheless, it is
   possible to make some basic assumptions about TCP flows.
   Accordingly, the mechanisms discussed here are applicable to all uses
   of TCP, albeit in varying degrees according to different scenarios
   (as noted where appropriate).
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   This recommendation is based on the explicit assumption that major
   changes to the entire installed base of routers and hosts are not a
   practical possibility.  This constrains any changes to hosts that are
   directly affected by errored links.

1.1 Should you be reading this recommendation?

   All known subnetwork technologies provide an "imperfect" subnetwork
   service - the bit error rate is non-zero.  But there's no obvious way
   for end stations to tell the difference between packets discarded due
   to congestion and losses due to transmission errors.

   If a directly-attached subnetwork is reporting transmission errors to
   a host, these reports matter, but we can't rely on explicit
   transmission error reports to both hosts.

   Another way of deciding if a subnetwork should be considered to have
   a "high error rate" is by appealing to mathematics.

   An approximate formula for the TCP Reno response function is given in
   [PFTK98]:

                                s
   T = --------------------------------------------------
       RTT*sqrt(2p/3) + tRTO*(3*sqrt(3p/8))*p*(1 + 32p**2)

   where

       T = the sending rate in bytes per second
       s = the packet size in bytes
       RTT = round-trip time in seconds
       tRTO = TCP retransmit timeout value in seconds
       p = steady-state packet loss rate

   If one plugs in an observed packet loss rate, does the math and then
   sees predicted bandwidth utilization that is greater than the link
   speed, the connection will not benefit from recommendations in this
   document, because the level of packet losses being encountered won't
   affect the ability of TCP to utilize the link.  If, however, the
   predicted bandwidth is less than the link speed, packet losses are
   affecting the ability of TCP to utilize the link.

   If further investigation reveals a subnetwork with significant
   transmission error rates, the recommendations in this document will
   improve the ability of TCP to utilize the link.
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   A few caveats are in order, when doing this calculation:

   (1)   the RTT is the end-to-end RTT, not the link RTT.
   (2)   Max(1.0, 4*RTT) can be substituted as a simplification for
         tRTO.
   (3)   losses may be bursty - a loss rate measured over an interval
         that includes multiple bursty loss events may understate the
         impact of these loss events on the sending rate.

1.2 Relationship of this recommendation to PEPs

   This document discusses end-to-end mechanisms that do not require
   TCP-level awareness by intermediate nodes.  This places severe
   limitations on what the end nodes can know about the nature of losses
   that are occurring between the end nodes.  Attempts to apply
   heuristics to distinguish between congestion and transmission error
   have not been successful [BV97, BV98, BV98a].  This restriction is
   relaxed in an informational document on Performance Enhancing Proxies
   (PEPs) [RFC3135]. Because PEPs can be placed on boundaries where
   network characteristics change dramatically, PEPs have an additional
   opportunity to improve performance over links with uncorrected
   errors.

   However, generalized use of PEPs contravenes the end-to-end principle
   and is highly undesirable given their deleterious implications, which
   include the following: lack of fate sharing (a PEP adds a third point
   of failure besides the endpoints themselves), end-to-end reliability
   and diagnostics, preventing end-to-end security (particularly network
   layer security such as IPsec), mobility (handoffs are much more
   complex because state must be transferred), asymmetric routing (PEPs
   typically require being on both the forward and reverse paths of a
   connection), scalability (PEPs add more state to maintain), QoS
   transparency and guarantees.

   Not every type of PEP has all the drawbacks listed above.
   Nevertheless, the use of PEPs may have very serious consequences
   which must be weighed carefully.

1.3 Relationship of this recommendation to Link Layer Mechanisms

   This recommendation is for use with TCP over subnetwork technologies
   (link layers) that have already been deployed.  Subnetworks that are
   intended to carry Internet protocols, but have not been completely
   specified are the subject of a best common practices (BCP) document
   which has been developed or is under development by the Performance
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   Implications of Link Characteristics WG (PILC) [PILC-WEB].  This last
   document is aimed at designers who still have the opportunity to
   reduce the number of uncorrected errors TCP will encounter.

2.0 Errors and Interactions with TCP Mechanisms

   A TCP sender adapts its use of network path capacity based on
   feedback from the TCP receiver.  As TCP is not able to distinguish
   between losses due to congestion and losses due to uncorrected
   errors, it is not able to accurately determine available path
   capacity in the presence of significant uncorrected errors.

2.1 Slow Start and Congestion Avoidance [RFC2581]

   Slow Start and Congestion Avoidance [RFC2581] are essential to the
   current stability of the Internet.  These mechanisms were designed to
   accommodate networks that do not provide explicit congestion
   notification.  Although experimental mechanisms such as [RFC2481] are
   moving in the direction of explicit congestion notification, the
   effect of ECN on ECN-aware TCPs is essentially the same as the effect
   of implicit congestion notification through congestion-related loss,
   except that ECN provides this notification before packets are lost,
   and must then be retransmitted.

   TCP connections experiencing high error rates on their paths interact
   badly with Slow Start and with Congestion Avoidance, because high
   error rates make the interpretation of losses ambiguous - the sender
   cannot know whether detected losses are due to congestion or to data
   corruption.  TCP makes the "safe" choice and assumes that the losses
   are due to congestion.

      -  Whenever sending TCPs receive three out-of-order
         acknowledgement, they assume the network is mildly congested
         and invoke fast retransmit/fast recovery (described below).

      -  Whenever TCP's retransmission timer expires, the sender assumes
         that the network is congested and invokes slow start.

      -  Less-reliable link layers often use small link MTUs.  This
         slows the rate of increase in the sender's window size during
         slow start, because the sender's window is increased in units
         of segments.  Small link MTUs alone don't improve reliability.
         Path MTU discovery [RFC1191] must also be used to prevent
         fragmentation.  Path MTU discovery allows the most rapid
         opening of the sender's window size during slow start, but a
         number of round trips may still be required to open the window
         completely.
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   Recommendation: Any standards-conformant TCP will implement Slow
   Start and Congestion Avoidance, which are MUSTs in STD 3 [RFC1122].
   Recommendations in this document will not interfere with these
   mechanisms.

2.2 Fast Retransmit and Fast Recovery [RFC2581]

   TCP provides reliable delivery of data as a byte-stream to an
   application, so that when a segment is lost (whether due to either
   congestion or transmission loss), the receiver TCP implementation
   must wait to deliver data to the receiving application until the
   missing data is received.  The receiver TCP implementation detects
   missing segments by segments arriving with out-of-order sequence
   numbers.

   TCPs should immediately send an acknowledgement when data is received
   out-of-order [RFC2581], providing the next expected sequence number
   with no delay, so that the sender can retransmit the required data as
   quickly as possible and the receiver can resume delivery of data to
   the receiving application.  When an acknowledgement carries the same
   expected sequence number as an acknowledgement that has already been
   sent for the last in-order segment received, these acknowledgement
   are called "duplicate ACKs".

   Because IP networks are allowed to reorder packets, the receiver may
   send duplicate acknowledgments for segments that arrive out of order
   due to routing changes, link-level retransmission, etc.  When a TCP
   sender receives three duplicate ACKs, fast retransmit [RFC2581]
   allows it to infer that a segment was lost.  The sender retransmits
   what it considers to be this lost segment without waiting for the
   full retransmission timeout, thus saving time.

   After a fast retransmit, a sender halves its congestion window and
   invokes the fast recovery [RFC2581] algorithm, whereby it invokes
   congestion avoidance from a halved congestion window, but does not
   invoke slow start from a one-segment congestion window as it would do
   after a retransmission timeout.  As the sender is still receiving
   dupacks, it knows the receiver is receiving packets sent, so the full
   reduction after a timeout when no communication has been received is
   not called for.  This relatively safe optimization also saves time.

   It is important to be realistic about the maximum throughput that TCP
   can have over a connection that traverses a high error-rate link.  In
   general, TCP will increase its congestion window beyond the delay-
   bandwidth product.  TCP's congestion avoidance strategy is additive-
   increase, multiplicative-decrease, which means that if additional
   errors are encountered before the congestion window recovers
   completely from a 50-percent reduction, the effect can be a "downward
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   spiral" of the congestion window due to additional 50-percent
   reductions.  Even using Fast Retransmit/Fast Recovery, the sender
   will halve the congestion window each time a window contains one or
   more segments that are lost, and will re-open the window by one
   additional segment for each congestion window's worth of
   acknowledgement received.

   If a connection's path traverses a link that loses one or more
   segments during this recovery period, the one-half reduction takes
   place again, this time on a reduced congestion window - and this
   downward spiral will continue to hold the congestion window below
   path capacity until the connection is able to recover completely by
   additive increase without experiencing loss.

   Of course, no downward spiral occurs if the error rate is constantly
   high and the congestion window always remains small; the
   multiplicative-increase "slow start" will be exited early, and the
   congestion window remains low for the duration of the TCP connection.
   In links with high error rates, the TCP window may remain rather
   small for long periods of time.

   Not all causes of small windows are related to errors.  For example,
   HTTP/1.0 commonly closes TCP connections to indicate boundaries
   between requested resources.  This means that these applications are
   constantly closing "trained" TCP connections and opening "untrained"
   TCP connections which will execute slow start, beginning with one or
   two segments.  This can happen even with HTTP/1.1, if webmasters
   configure their HTTP/1.1 servers to close connections instead of
   waiting to see if the connection will be useful again.

   A small window - especially a window of less than four segments -
   effectively prevents the sender from taking advantage of Fast
   Retransmits.  Moreover, efficient recovery from multiple losses
   within a single window requires adoption of new proposals (NewReno
   [RFC2582]).

   Recommendation: Implement Fast Retransmit and Fast Recovery at this
   time.  This is a widely-implemented optimization and is currently at
   Proposed Standard level.  [RFC2488] recommends implementation of Fast
   Retransmit/Fast Recovery in satellite environments.

2.3 Selective Acknowledgements [RFC2018, RFC2883]

   Selective Acknowledgements [RFC2018] allow the repair of multiple
   segment losses per window without requiring one (or more) round-trips
   per loss.
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   [RFC2883] proposes a minor extension to SACK that allows receiving
   TCPs to provide more information about the order of delivery of
   segments, allowing "more robust operation in an environment of
   reordered packets, ACK loss, packet replication, and/or early
   retransmit timeouts".  Unless explicitly stated otherwise, in this
   document, "Selective Acknowledgements" (or "SACK") refers to the
   combination of [RFC2018] and [RFC2883].

   Selective acknowledgments are most useful in LFNs ("Long Fat
   Networks") because of the long round trip times that may be
   encountered in these environments, according to Section 1.1 of
   [RFC1323], and are especially useful if large windows are required,
   because there is a higher probability of multiple segment losses per
   window.

   On the other hand, if error rates are generally low but occasionally
   higher due to channel conditions, TCP will have the opportunity to
   increase its window to larger values during periods of improved
   channel conditions between bursts of errors.  When bursts of errors
   occur, multiple losses within a window are likely to occur.  In this
   case, SACK would provide benefits in speeding the recovery and
   preventing unnecessary reduction of the window size.

   Recommendation: Implement SACK as specified in [RFC2018] and updated
   by [RFC2883], both Proposed Standards.  In cases where SACK cannot be
   enabled for both sides of a connection, TCP senders may use NewReno
   [RFC2582] to better handle partial ACKs and multiple losses within a
   single window.

3.0 Summary of Recommendations

   The Internet does not provide a widely-available loss feedback
   mechanism that allows TCP to distinguish between congestion loss and
   transmission error.  Because congestion affects all traffic on a path
   while transmission loss affects only the specific traffic
   encountering uncorrected errors, avoiding congestion has to take
   precedence over quickly repairing transmission errors.  This means
   that the best that can be achieved without new feedback mechanisms is
   minimizing the amount of time that is spent unnecessarily in
   congestion avoidance.

   The Fast Retransmit/Fast Recovery mechanism allows quick repair of
   loss without giving up the safety of congestion avoidance.  In order
   for Fast Retransmit/Fast Recovery to work, the window size must be
   large enough to force the receiver to send three duplicate
   acknowledgments before the retransmission timeout interval expires,
   forcing full TCP slow-start.
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   Selective Acknowledgements (SACK) extend the benefit of Fast
   Retransmit/Fast Recovery to situations where multiple segment losses
   in the window need to be repaired more quickly than can be
   accomplished by executing Fast Retransmit for each segment loss, only
   to discover the next segment loss.

   These mechanisms are not limited to wireless environments.  They are
   usable in all environments.

4.0 Topics For Further Work

   "Limited Transmit" [RFC3042] has been specified as an optimization
   extending Fast Retransmit/Fast Recovery for TCP connections with
   small congestion windows that will not trigger three duplicate
   acknowledgments.  This specification is deemed safe, and it also
   provides benefits for TCP connections that experience a large amount
   of packet (data or ACK) loss.  Implementors should evaluate this
   standards track specification for TCP in loss environments.

   Delayed Duplicate Acknowledgements [MV97, VMPM99] attempts to prevent
   TCP-level retransmission when link-level retransmission is still in
   progress, adding additional traffic to the network.  This proposal is
   worthy of additional study, but is not recommended at this time,
   because we don't know how to calculate appropriate amounts of delay
   for an arbitrary network topology.

   It is not possible to use explicit congestion notification [RFC2481]
   as a surrogate for explicit transmission error notification (no
   matter how much we wish it was!).  Some mechanism to provide explicit
   notification of transmission error would be very helpful.  This might
   be more easily provided in a PEP environment, especially when the PEP
   is the "first hop" in a connection path, because current checksum
   mechanisms do not distinguish between transmission error to a payload
   and transmission error to the header.  Furthermore, if the header is
   damaged, sending explicit transmission error notification to the
   right endpoint is problematic.

   Losses that take place on the ACK stream, especially while a TCP is
   learning network characteristics, can make the data stream quite
   bursty (resulting in losses on the data stream, as well).  Several
   ways of limiting this burstiness have been proposed, including TCP
   transmit pacing at the sender and ACK rate control within the
   network.

   "Appropriate Byte Counting" (ABC) [ALL99], has been proposed as a way
   of opening the congestion window based on the number of bytes that
   have been successfully transfered to the receiver, giving more
   appropriate behavior for application protocols that initiate
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   connections with relatively short packets.  For SMTP [RFC2821], for
   instance, the client might send a short HELO packet, a short MAIL
   packet, one or more short RCPT packets, and a short DATA packet -
   followed by the entire mail body sent as maximum-length packets.  An
   ABC TCP sender would not use ACKs for each of these short packets to
   increase the congestion window to allow additional full-length
   packets.  ABC is worthy of additional study, but is not recommended
   at this time, because ABC can lead to increased burstiness when
   acknowledgments are lost.

4.1 Achieving, and maintaining, large windows

   The recommendations described in this document will aid TCPs in
   injecting packets into ERRORed connections as fast as possible
   without destabilizing the Internet, and so optimizing the use of
   available bandwidth.

   In addition to these TCP-level recommendations, there is still
   additional work to do at the application level, especially with the
   dominant application protocol on the World Wide Web, HTTP.

   HTTP/1.0 (and earlier versions) closes TCP connections to signal a
   receiver that all of a requested resource had been transmitted.
   Because WWW objects tend to be small in size [MOGUL], TCPs carrying
   HTTP/1.0 traffic experience difficulty in "training" on available
   path capacity (a substantial portion of the transfer has already
   happened by the time TCP exits slow start).

   Several HTTP modifications have been introduced to improve this
   interaction with TCP ("persistent connections" in HTTP/1.0, with
   improvements in HTTP/1.1 [RFC2616]).  For a variety of reasons, many
   HTTP interactions are still HTTP/1.0-style - relatively short-lived.

   Proposals which reuse TCP congestion information across connections,
   like TCP Control Block Interdependence [RFC2140], or the more recent
   Congestion Manager [BS00] proposal, will have the effect of making
   multiple parallel connections impact the network as if they were a
   single connection, "trained" after a single startup transient.  These
   proposals are critical to the long-term stability of the Internet,
   because today's users always have the choice of clicking on the
   "reload" button in their browsers and cutting off TCP's exponential
   backoff - replacing connections which are building knowledge of the
   available bandwidth with connections with no knowledge at all.
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5.0 Security Considerations

   A potential vulnerability introduced by Fast Retransmit/Fast Recovery
   is (as pointed out in [RFC2581]) that an attacker may force TCP
   connections to grind to a halt, or, more dangerously, behave more
   aggressively.  The latter possibility may lead to congestion
   collapse, at least in some regions of the network.

   Selective acknowledgments is believed to neither strengthen nor
   weaken TCP's current security properties [RFC2018].

   Given that the recommendations in this document are performed on an
   end-to-end basis, they continue working even in the presence of end-
   to-end IPsec.  This is in direct contrast with mechanisms such as
   PEP's which are implemented in intermediate nodes (section 1.2).

6.0 IANA Considerations

   This document is a pointer to other, existing IETF standards.  There
   are no new IANA considerations.
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Full Copyright Statement

   Copyright (C) The Internet Society (2001).  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
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